12 research outputs found

    DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity

    Full text link
    Nowadays, events usually burst and are propagated online through multiple modern media like social networks and search engines. There exists various research discussing the event dissemination trends on individual medium, while few studies focus on event popularity analysis from a cross-platform perspective. Challenges come from the vast diversity of events and media, limited access to aligned datasets across different media and a great deal of noise in the datasets. In this paper, we design DancingLines, an innovative scheme that captures and quantitatively analyzes event popularity between pairwise text media. It contains two models: TF-SW, a semantic-aware popularity quantification model, based on an integrated weight coefficient leveraging Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series alignment model matching different event phases adapted from Dynamic Time Warping. We also propose three metrics to interpret event popularity trends between pairwise social platforms. Experimental results on eighteen real-world event datasets from an influential social network and a popular search engine validate the effectiveness and applicability of our scheme. DancingLines is demonstrated to possess broad application potentials for discovering the knowledge of various aspects related to events and different media

    Extreme value prediction: an application to sport records

    Get PDF
    Extreme value theory studies the extreme deviations from the central portion of a probability distribution. Results in this field have considerable importance in assessing the risk that characterises rare events, such as collapse of the stock market, or earthquakes of exceptional intensity, or floods. In the last years, application of extreme value theory for prediction of sport records have received increased interest by the scientific community. In this work we face the problem of constructing prediction limits for series of extreme values coming from sport data. We propose the use of a calibration procedure applied to the generalised extreme value distribution, in order to obtain a proper predictive distribution for future records. The calibrated procedure is applied to series of real data related to sport records. In particular, we consider sequences of annual maxima for different athletic events. Using the proposed calibrated predictive distribution, we show how to correctly predict the probability of future records and we discuss the existence and interpretation of ultimate records

    Simultaneous calibrated prediction intervals for time series

    Get PDF
    This paper deals with simultaneous prediction for time series models. In particular, it presents a simple procedure which gives well-calibrated simultaneous predictive intervals with coverage probability equal or close to the target nominal value. Although the exact computation of the proposed intervals is usually not feasible, an approximation can be easily obtained by means of a suitable bootstrap simulation procedure. This new predictive solution is much simpler to compute than those ones already proposed in the literature based on asymptotic calculations. An application of the bootstrap calibrated procedure to first order autoregressive models is presented

    Robust prediction limits based on M-estimators

    No full text
    We discuss a robust solution to the problem of prediction. Extending Barndorff-Nielsen and Cox [1996. Prediction and asymptotics. Bernoulli 2, 319-340] and Vidoni [1998. A note on modified estimative prediction limits and distributions. Biometrika 85, 949-953], we propose improved prediction limits based on M-estimators. To compute them, the expressions of the bias and variance of an M-estimator are required. In view of this, a general asymptotic approximation for the bias of an M-estimator is derived. Moreover, by means of comparative studies in the context of affine transformation models, we show that the proposed robust procedure for prediction can be successfully used in a parametric setting.Bias Influence function Prediction Robustness Scale-regression model

    A characterization of monotone and regular divergences

    Get PDF
    Preprint enviat per a la seva publicació en una revista científica: Annals of the Institute of Statistical Mathematics, (1998), volume 50, nº 3, pp. 433–450. [http://doi.org/10.1023/A:1003569210573]In this paper we characterize the local structure of monotone and regular divergences, which include f-divergences as a particular case, by giving their Taylor expansion up to fourth order. We extend a previous result obtained by Čencov, using the invariant properties of Amari's α-connections

    A study on microblog and search engine user behaviors: how twitter trending topics help predict google hot queries

    No full text
    Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. We claim that social trends fired by Twitter may help explain and predict web trends derived from Google. Indeed, we argue that information flooding nearly real-time across the Twitter social network could anticipate the set of topics that users will later search on the Web. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google. Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter “asis” are also found as hot queries issued to Google. Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models. First, we find that Google by its own is not able to effectively predict the time behavior of its trends. Indeed, we show that autoregressive models, which try to fit time series of Google trends, perform poorly. On the other hand, we validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times. Moreover, we discover that a Twitter trend causes a similar Google trend to later occur about 43% of times. In the end, we show that the very best-performing models are those using past values of both Twitter and Google

    A Study on Microblog and Search Engine User Behaviors: How Twitter Trending Topics Help Predict Google Hot Queries

    No full text
    Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. We claim that social trends fired by Twitter may help explain and predict web trends derived from Google. Indeed, we argue that information flooding nearly real-time across the Twitter social network could anticipate the set of topics that users will later search on the Web. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google. Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter “asis” are also found as hot queries issued to Google. Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models. First, we find that Google by its own is not able to effectively predict the time behavior of its trends. Indeed, we show that autoregressive models, which try to fit time series of Google trends, perform poorly. On the other hand, we validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times. Moreover, we discover that a Twitter trend causes a similar Google trend to later occur about 43% of times. In the end, we show that the very best-performing models are those using past values of both Twitter and Google

    On the relationships between \alpha-connections and the asymptotic properties of predictive distributions

    No full text
    Preprint enviat per a la seva publicaciĂł en una revista cientĂ­fica: Bernoulli, 1999, vol. 5, nĂşm. 1, p. 163-176. [http://projecteuclid.org/euclid.bj/1173707099]In a recent paper Komaki studies the second-order asymptotic properties of the predictive distributions, using the Kullback-Leibler divergence as loss function. He shows that estimative distributions with asymptotically efficient estimators can be improved by predictive distributions that do not belong to the model. The model is assumed to be a multidimensional curved exponential family. In this paper we generalize the result assuming as loss function any f-divergence. It appears a relationship between the a-connections and the optimal predictive distributions. In particular, using an a-divergence to measure the goodness of a predictive distribution, the optimal shift of the estimative distribution is related with alpha-covariant derivatives. The expression we obtain for the asymptotic risk is also useful to study the higher-order asymptotic properties of an estimator, in the mentioned class of loss functions
    corecore